Skip to content

[TRACIER - DO NO MERGE] feat: scenario tests for nightly CI runs#72

Closed
redpanda-f wants to merge 62 commits intomainfrom
feat/redpanda/feat-scenarios-py
Closed

[TRACIER - DO NO MERGE] feat: scenario tests for nightly CI runs#72
redpanda-f wants to merge 62 commits intomainfrom
feat/redpanda/feat-scenarios-py

Conversation

@redpanda-f
Copy link
Copy Markdown
Contributor

@redpanda-f redpanda-f commented Mar 3, 2026

Summary

NOTE: THIS PR IS IN DRAFT SINCE IT IS ALMOST ENTIRELY DECOMPOSED INTO SMALLER PRs

Introduces:

  • a Python-based scenario testing framework,
  • a full CI overhaul with nightly,
  • stability/frontier matrix runs (latesttag testing and latestcommit testing) and automated GitHub issue reporting,
  • a new latesttag location type for foc-devnet dependentes,
  • linting infrastructure,
  • and several bug fixes.

Reviewers may want to look at following Pull Requests for smaller, discrete additions, one at a time:

Base PRs:

Linear, interdependent PRs, to finish up:

Testing infra

  • Python based testing infrastructure with no dependencies apart from standard python3 libraries.
  • run.py is a "core", shared testing infrastructure for error reportings and utility functions.
  • 4 tests introduced:
    • Added scenarios/test_containers.py — verifies all expected Docker containers are running.
    • Added scenarios/test_basic_balances.py — checks Lotus wallet and FEVM account balances.
    • Added scenarios/test_storage_e2e.py — end-to-end storage scenario against the live devnet.
    • Added scenarios/test_caching_subsystem.py — validates the caching subsystem via CQL in curio.

CI Fixes

  • Continous Integration needed overhaul for allowing for nightly runs etc.
  • ci.yml is now:
    • ci_run.yml: The core "meat" of the tasks execution
    • ci_pull_request.yml is triggered for all PRs raised towards main
    • ci_nightly.yml is triggered as cron (
      # Nightly at 03:00 UTC
      - cron: '0 3 * * *'
      )
      • Two distinct tests are run
      • frontier: Tests for whatever commits are latest on dependent curio, filecoin-services, lotus is locked at latest tag.
      • stability: Tests for whatever latest tags are marked on dependent curio, filecoin-services, lotus is locked at latest tag.
  • Nightly tags in use:
    • Issues will be tagged and appended when they have following labels:
    • scenarios-run-frontier
    • scenarios-run-stability
    • Ref:
      - name: stability
      init_flags: "--curio latesttag:pdpv0 --filecoin-services latesttag:main"
      issue_label: scenarios-run-stability
      issue_title: "FOC Devnet scenarios run report (stability)"
      - name: frontier
      init_flags: "--curio gitbranch:pdpv0 --filecoin-services gitbranch:main"
      issue_label: scenarios-run-frontier
      issue_title: "FOC Devnet scenarios run report (frontier)"

--noterminal Flag for version Command

  • Added --noterminal boolean flag to foc-devnet version.
  • When set, suppresses tracing prefixes so output can be captured cleanly by scripts and
    scenario reporters.

Bug Fixes & Dependency Updates

  • Deploy script rename: updated all references from deploy-all-warm-storage.sh to
    warm-storage-deploy-all.sh to match the upstream rename in filecoin-services.
    • This raises a larger question of what to do when "frontier" has moved to a different setup (codewise in foc-devnet) w.r.t "stability". cc @rvagg
  • YugabyteDB: removed --advertise_address=0.0.0.0 flag that caused address-binding
    conflicts.
  • YCQL port in devnet-info: added ycql_port field to YugabyteInfo and populated it
    from the step context, exposing the CQL port (9042) to consumers of devnet-info.json.
  • Default dependency versions:
    • Lotus: v1.34.4-rc1v1.35.0
    • Curio: 4d53c80...e109aec7...
    • filecoin-services: 2b247916...c79e2c6b...

Linting Infrastructure

  • Added scripts/lint.sh — unified linter running cargo fmt, cargo clippy, and black
    in either check mode (FIX=0) or auto-fix mode (FIX=1).
  • Added scripts/install_precommit_hooks.sh — installs a git pre-commit hook that calls
    lint.sh before each commit; supports git worktrees.
  • Updated .gitignore to exclude __pycache__/ directories and .githooks/.

Pre-merge checklist:

Post-merge checklist:

  • Ensure one nightly run was correctly run. Stability may fail for a while till filecoin-services has a tagged commit
  • Ensure we can run a manual workflow with inputs

@FilOzzy FilOzzy added this to FOC Mar 3, 2026
@github-project-automation github-project-automation bot moved this to 📌 Triage in FOC Mar 3, 2026
@redpanda-f redpanda-f changed the base branch from feat/redpanda/feat-scenarios to main March 3, 2026 09:43
@redpanda-f redpanda-f changed the base branch from main to feat/redpanda/feat-scenarios March 3, 2026 09:44
@redpanda-f redpanda-f self-assigned this Mar 3, 2026
@redpanda-f redpanda-f moved this from 📌 Triage to ⌨️ In Progress in FOC Mar 3, 2026

if not py or py == "":
info(f"Installing custom python version {PYTHON_VERSION}")
info(sh(f"npm i --global --silent @bjia56/portable-python-{PYTHON_VERSION}"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah what? installing python via npm? pyception. Let' not do this; setup the container's environment in the action, don't install system tools in the test itself

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very weird indeed. But ycql needs very specific python interpreter and does not like python version that was on my machine (too high of a semver according to it). If we setup the container environment to match python version required here, we are constraining the entire run due to this, and eventually this knowledge may be lost down the line.

Issue here is that we want "scenarios isolation", and portably python seems to me the right way. Agreed, python via npm is not the best. Do you have better alternatives?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added new script, and removed setup in scenarios.

Comment on lines +63 to +64
_fail += 1
sys.exit(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, I don't think _fail incrementing here is very helpful, do we want to print it maybe?

@redpanda-f
Copy link
Copy Markdown
Contributor Author

cc @rvagg this action run is a classic case of when "frontier" foc-devnet does not really support "stability" anymore. Till now, I have no good answers for here, apart from doing quite complicated foc-devnet versioning. Would like your opinion

referece actions run: https://github.com/FilOzone/foc-devnet/actions/runs/22837909291/job/66237925146?pr=72

redpanda-f and others added 2 commits March 9, 2026 11:15
Co-authored-by: Rod Vagg <rod@vagg.org>
@redpanda-f
Copy link
Copy Markdown
Contributor Author

A couple of things when looking at #75

  1. Can we include the synapse commit that was used as well?

That is scenario specific property, not a property for overall foc-devnet. Logs for the specific scenario has that introduced. ref:

sdk_commit = sh(f"git -C {sdk_dir} rev-parse HEAD")
info(f"synapse-sdk commit: {sdk_commit}")

  1. (nit) Status has a period after it "Status." We can remove the period

Removed

  1. For versions info, at the minimum can we remove the prefix that is in the log messages (e.g., "�[32m INFO�[0m �[2msrc/main_app/version.rs�[0m�[2m:�[0m�[2m19:�[0m "). (Even better is just to get it to a simpler bullet list or table, but that's not essential.)

yup, --noterminal flag has been added for this, and should work from next runs. (

foc-devnet/src/cli.rs

Lines 74 to 78 in f610d13

Version {
/// Print plain output without tracing prefixes
#[arg(long)]
noterminal: bool,
},
)

  1. CI run link: can we link directly do the corresponding "job" rather than just to the "run". For example, link to the stability job.

Introduced, should be available from next runs. ref:

foc-devnet/scenarios/run.py

Lines 309 to 321 in 8692af2

# Print CI run URL in stdout if available
if os.environ.get("GITHUB_RUN_ID") and os.environ.get("GITHUB_REPOSITORY"):
github_server = os.environ.get("GITHUB_SERVER_URL", "https://github.com")
ci_url = f"{github_server}/{os.environ.get('GITHUB_REPOSITORY')}/actions/runs/{os.environ.get('GITHUB_RUN_ID')}"
ci_url_type = "run"
# If CI run is available, we may also have the `GITHUB_CI_JOB_ID` set, see `ci_run.yml``
# This env var allows us to link to specific jobs instead of entire run
if os.environ.get("GITHUB_CI_JOB_ID"):
ci_url = f"{github_server}/{os.environ.get('GITHUB_REPOSITORY')}/actions/runs/{os.environ.get('GITHUB_RUN_ID')}/job/{os.environ.get("GITHUB_CI_JOB_ID")}"
ci_url_type = "job"
print(f"CI {ci_url_type}: {ci_url}")
sys.exit(0 if scenario_fail == 0 else 1)

  1. For the test summaries, can we link to the corresponding step as well? For example,I would expect to see a URL with an anchor like #step:39:26 (e.g., https://github.com/FilOzone/foc-devnet/actions/runs/22752589844/job/65990079825#step:39:27)

This is trickier from what I could gather. This needs us to have exact introspective knowledge of where we are in terms of loglines wrt github logging. I think Job link should be enough to navigate.

@redpanda-f
Copy link
Copy Markdown
Contributor Author

redpanda-f commented Mar 9, 2026

A couple of things:

  1. I assume a better PR description will be provided

yes, added

  1. I was also wondering about how to break this down into smaller units so parts can start getting merged in smaller chunk. I could imagine a PR for the "latesttag" functionality. Maybe another for the githooks and linting, and then the main PR that adds the scenarios?

Many sub items have been raised into separate PRs

  1. I assume we need to add some README_ADVANCED docs about "latesttag"?

Yup, see 0b5816f

@redpanda-f redpanda-f marked this pull request as draft March 10, 2026 09:35
@BigLep BigLep changed the title feat: scenario tests for nightly CI runs [TRACIER - DO NO MERGE] feat: scenario tests for nightly CI runs Mar 10, 2026
@BigLep
Copy link
Copy Markdown
Contributor

BigLep commented Mar 10, 2026

@redpanda-f : a couple things

  1. I assume at this point this PR should never be merged. It's serving as an overview of all the functionality that has been added/changed. (This is content that could be in the issue CI/Nightly End-to-End Validation of FOC as a whole #8, but given how this progressed, the content is in here - fine - no problem). I have retitled it to make clear this shouldn't get merged.
  2. For the Curio and filecoin-services dependencies, I think it would be nice to be on tags since we do tagged releases there, but I can get that that we haven't created tags in a bit... chore: updated defaults for config.rs  #78 (review)

@BigLep BigLep moved this from ⌨️ In Progress to ⌚️ Issue awaiting PR merge in FOC Mar 12, 2026
@redpanda-f redpanda-f closed this Mar 13, 2026
@github-project-automation github-project-automation bot moved this from ⌚️ Issue awaiting PR merge to 🎉 Done in FOC Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

CI/Nightly End-to-End Validation of FOC as a whole

6 participants